Near-optimal Distributions for Data Matrix Sampling
نویسندگان
چکیده
We give near-optimal distributions for the sparsification of large m n matrices, where m ! n, for example representing n observations over m attributes. Our algorithms can be applied when the non-zero entries are only available as a stream, i.e., in arbitrary order, and result in matrices which are not only sparse, but whose values are also highly compressible. In particular, algebraic operations with the resulting matrices can be implemented as (ultra-efficient) operations over indices.
منابع مشابه
Near-Optimal Entrywise Sampling for Data Matrices
We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2. For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they...
متن کاملEIGENVECTORS OF COVARIANCE MATRIX FOR OPTIMAL DESIGN OF STEEL FRAMES
In this paper, the discrete method of eigenvectors of covariance matrix has been used to weight minimization of steel frame structures. Eigenvectors of Covariance Matrix (ECM) algorithm is a robust and iterative method for solving optimization problems and is inspired by the CMA-ES method. Both of these methods use covariance matrix in the optimization process, but the covariance matrix calcula...
متن کاملLipschitz Density-Ratios, Structured Data, and Data-driven Tuning
Density-ratio estimation (i.e. estimating f = fQ/fP for two unknown distributions Q and P ) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rat...
متن کاملComparison of Optimal Design Methods in Inverse Problems.
Typical optimal design methods for inverse or parameter estimation problems are designed to choose optimal sampling distributions through minimization of a specific cost function related to the resulting error in parameter estimates. It is hoped that the inverse problem will produce parameter estimates with increased accuracy using data collected according to the optimal sampling distribution. ...
متن کاملA Simple Approach to Optimal CUR Decomposition
Prior optimal CUR decomposition and near optimal column reconstruction methods have been established by combining BSS sampling and adaptive sampling. In this paper, we propose a new approach to the optimal CUR decomposition and near optimal column reconstruction by just using leverage score sampling. In our approach, both the BSS sampling and adaptive sampling are not needed. Moreover, our appr...
متن کامل